AITopics | cefr level

Collaborating Authors

cefr level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

Kikuchi, Masato, Ono, Masatsugu, Soga, Toshioki, Tanabe, Tetsu, Ozono, Tadachika

arXiv.org Artificial IntelligenceOct-22-2025

Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.

cefr level, large language model, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.18466

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Hokkaidō (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties

Lee, Jiyoung, Kim, Seungho, Han, Jieun, Lee, Jun-Min, Kim, Kitaek, Oh, Alice, Choi, Edward

arXiv.org Artificial IntelligenceOct-10-2025

Large Language Models (LLMs) are predominantly evaluated on Standard American English (SAE), often overlooking the diversity of global English varieties. This narrow focus may raise fairness concerns as degraded performance on non-standard varieties can lead to unequal benefits for users worldwide. Therefore, it is critical to extensively evaluate the linguistic robustness of LLMs on multiple non-standard English varieties. We introduce Trans-EnV, a framework that automatically transforms SAE datasets into multiple English varieties to evaluate the linguistic robustness. Our framework combines (1) linguistics expert knowledge to curate variety-specific features and transformation guidelines from linguistic literature and corpora, and (2) LLM-based transformations to ensure both linguistic validity and scalability. Using Trans-EnV, we transform six benchmark datasets into 38 English varieties and evaluate seven state-of-the-art LLMs. Our results reveal significant performance disparities, with accuracy decreasing by up to 46.3% on non-standard varieties. These findings highlight the importance of comprehensive linguistic robustness evaluation across diverse English varieties. Each construction of Trans-EnV was validated through rigorous statistical testing and consultation with a researcher in the field of second language acquisition, ensuring its linguistic validity. Our code and datasets are publicly available at https://github.com/jiyounglee-0523/TransEnV and https://huggingface.co/collections/jiyounglee0523/transenv-681eadb3c0c8cf363b363fb1.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.20875

Country:

North America > Canada > Newfoundland and Labrador > Newfoundland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry:

Education (1.00)
Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment

Lo, Tien-Hong, Chen, Szu-Yu, Sung, Yao-Ting, Chen, Berlin

arXiv.org Artificial IntelligenceSep-23-2025

Abstract--A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. T o address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.03372

Country:

Asia > Taiwan (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Czechia (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)

Add feedback

UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment

Imperial, Joseph Marvin, Barayan, Abdullah, Stodden, Regina, Wilkens, Rodrigo, Sanchez, Ricardo Munoz, Gao, Lingyun, Torgbi, Melissa, Knight, Dawn, Forey, Gail, Jablonkai, Reka R., Kochmar, Ekaterina, Reynolds, Robert, Ribeiro, Eugénio, Saggion, Horacio, Volodina, Elena, Vajjala, Sowmya, François, Thomas, Alva-Manchego, Fernando, Madabushi, Harish Tayyar

arXiv.org Artificial IntelligenceSep-17-2025

We introduce UniversalCEFR, a large-scale multilingual and multidimensional dataset of texts annotated with CEFR (Common European Framework of Reference) levels in 13 languages. To enable open research in automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modelling across tasks and languages. To demonstrate its utility, we conduct benchmarking experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution for language proficiency research by standardising dataset formats, and promoting their accessibility to the global research community.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.01419

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(33 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts

Gendron, Barbara, Guibon, Gaël, D'aquin, Mathieu

arXiv.org Artificial IntelligenceSep-8-2025

The controllability of Large Language Models (LLMs) when used as conversational agents is a key challenge, particularly to ensure predictable and user-personalized responses. This work proposes an ontology-based approach to formally define conversational features that are typically qualitative in nature. By leveraging a set of linguistic descriptors, we derive quantitative definitions for qualitatively-defined concepts, enabling their integration into an ontology for reasoning and consistency checking. We apply this framework to the task of proficiency-level control in conversations, using CEFR language proficiency levels as a case study. These definitions are then formalized in description logic and incorporated into an ontology, which guides controlled text generation of an LLM through fine-tuning. Experimental results demonstrate that our approach provides consistent and explainable proficiency-level definitions, improving transparency in conversational AI.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.04926

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring

Almasi, Mina, Kristensen-McLachlan, Ross Deans

arXiv.org Artificial IntelligenceJun-10-2025

This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.08351

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (0.68)
Education > Curriculum > Subject-Specific Education (0.49)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs

Bannò, Stefano, Knill, Kate, Gales, Mark

arXiv.org Artificial IntelligenceJun-4-2025

Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP). The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level. We evaluate the ability of LLMs to assign proficiency levels to individual words as they appear in L2 learner writing, addressing key challenges such as polysemy, contextual variation, and multi-word expressions. We compare LLMs to a PoS-based baseline. LLMs appear to exploit additional semantic information that yields improved performance. We also explore correlations between word-level proficiency and essay-level proficiency. Finally, the approach is applied to examine the consistency of the EVP proficiency levels. Results show that LLMs are well-suited for the task of vocabulary assessment.

cefr level, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.02758

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ARWI: Arabic Write and Improve

Chirkunov, Kirill, Alhafni, Bashar, Qwaider, Chatrine, Habash, Nizar, Briscoe, Ted

arXiv.org Artificial IntelligenceApr-17-2025

Although Arabic is spoken by over 400 million people, advanced Arabic writing assistance tools remain limited. To address this gap, we present ARWI, a new writing assistant that helps learners improve essay writing in Modern Standard Arabic. ARWI is the first publicly available Arabic writing assistant to include a prompt database for different proficiency levels, an Arabic text editor, state-of-the-art grammatical error detection and correction, and automated essay scoring aligned with the Common European Framework of Reference standards for language attainment. Moreover, ARWI can be used to gather a growing auto-annotated corpus, facilitating further research on Arabic grammar correction and essay scoring, as well as profiling patterns of errors made by native speakers and non-native learners. A preliminary user study shows that ARWI provides actionable feedback, helping learners identify grammatical gaps, assess language proficiency, and guide improvement.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.11814

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(8 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Education > Assessment & Standards > Student Performance (0.91)
Education > Educational Technology > Educational Software (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

Qwaider, Chatrine, Alhafni, Bashar, Chirkunov, Kirill, Habash, Nizar, Briscoe, Ted

arXiv.org Artificial IntelligenceMar-22-2025

Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback. Arabic AES systems are particularly challenged by the lack of annotated essay datasets. This paper presents a novel framework leveraging Large Language Models (LLMs) and Transformers to generate synthetic Arabic essay datasets for AES. We prompt an LLM to generate essays across CEFR proficiency levels and introduce controlled error injection using a fine-tuned Standard Arabic BERT model for error type prediction. Our approach produces realistic human-like essays, contributing a dataset of 3,040 annotated essays. Additionally, we develop a BERT-based auto-marking system for accurate and scalable Arabic essay evaluation. Experimental results demonstrate the effectiveness of our framework in improving Arabic AES performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.17739

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York (0.04)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
(4 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.48)
Personal > Interview (0.46)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.71)
Education > Educational Technology > Educational Software > Computer Based Training (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English

Mallart, Cyriel, Simpkin, Andrew, Ballier, Nicolas, Lissón, Paula, Venant, Rémi, Li, Jen-Yu, Stearns, Bernardo, Gaillat, Thomas

arXiv.org Artificial IntelligenceMar-14-2025

This article addresses Second Language (L2) writing development through an investigation of new grammatical and structural complexity metrics. We explore the paradigmatic production in learner English by linking language functions to specific grammatical paradigms. Using the EFCAMDAT as a gold standard and a corpus of French learners as an external test set, we employ a supervised learning framework to operationalise and evaluate seven microsystems. We show that learner levels are associated with the seven microsystems (MS). Using ordinal regression modelling for evaluation, the results show that all MS are significant but yield a low impact if taken individually. However, their influence is shown to be impactful if taken as a group. These microsystems and their measurement method suggest that it is possible to use them as part of broader-purpose CALL systems focused on proficiency assessment.

microsystem, probability, proficiency, (16 more...)

arXiv.org Artificial Intelligence

2503.1022

Country:

Europe > Netherlands (0.28)
Europe > Sweden (0.14)
North America > United States > Hawaii (0.14)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback